91 research outputs found

    On the white, the black, and the many shades of gray in between:Our reply to Van Ravenzwaaij and Wagenmakers (2021)

    Get PDF
    In 2019 we wrote an article (Tendeiro & Kiers, 2019) in Psychological Methods over null hypothesis Bayesian testing and its working horse, the Bayes factor. Recently, van Ravenzwaaij and Wagenmakers (2021) offered a response to our piece, also in this journal. Although we do welcome their contribution with thought-provoking remarks on our article, we ended up concluding that there were too many "issues" in van Ravenzwaaij and Wagenmakers (2021) that warrant a rebuttal. In this article we both defend the main premises of our original article and we put the contribution of van Ravenzwaaij and Wagenmakers (2021) under critical appraisal. Our hope is that this exchange between scholars decisively contributes toward a better understanding among psychologists of null hypothesis Bayesian testing in general and of the Bayes factor in particular. (PsycInfo Database Record (c) 2022 APA, all rights reserved)

    The Use of Nonparametric Item Response Theory to Explore Data Quality

    Get PDF
    The aim of this chapter is to provide insight into a number of commonly used nonparametric item response theory (NIRT) methods and to show how these methods can be used to describe and explore the psychometric quality of questionnaires used in patient-reported outcome measurement and, more in general, typical performance measurement (personality, mood, health-related constructs). NIRT is an extremely valuable tool for preliminary data analysis and for evaluating whether item response data are acceptable for parametric IRT modeling. This is in particular useful in the field of typical performance measurement where the construct being measured is often very different than in maximum performance measurement (education, intelligence; see Chapter 1 of this handbook). Our basic premise is that there are no “best tools” or “best models” and that the usefulness of psychometric modeling depends on the specific aims of the instrument (questionnaire, test) that is being used. Most important is, however, that it should be clear for a researcher how sensitive a specific method (for example, DETECT, or Mokken scaling) is to the assumptions that are being investigated. The NIRT literature is not always clear about this, and in this chapter we try to clarify some of these ambiguities

    On the Practical Consequences of Misfit in Mokken Scaling

    Get PDF
    Mokken scale analysis is a popular method to evaluate the psychometric quality of clinical and personality questionnaires and their individual items. Although many empirical papers report on the extent to which sets of items form Mokken scales, there is less attention for the effect of violations of commonly used rules of thumb. In this study, the authors investigated the practical consequences of retaining or removing items with psychometric properties that do not comply with these rules of thumb. Using simulated data, they concluded that items with low scalability had some influence on the reliability of test scores, person ordering and selection, and criterion-related validity estimates. Removing the misfitting items from the scale had, in general, a small effect on the outcomes. Although important outcome variables were fairly robust against scale violations in some conditions, authors conclude that researchers should not rely exclusively on algorithms allowing automatic selection of items. In particular, content validity must be taken into account to build sensible psychometric instruments

    Admission testing for higher education:A multi-cohort study on the validity of high-fidelity curriculum-sampling tests

    Get PDF
    <div><p>We investigated the validity of curriculum-sampling tests for admission to higher education in two studies. Curriculum-sampling tests mimic representative parts of an academic program to predict future academic achievement. In the first study, we investigated the predictive validity of a curriculum-sampling test for first year academic achievement across three cohorts of undergraduate psychology applicants and for academic achievement after three years in one cohort. We also studied the relationship between the test scores and enrollment decisions. In the second study, we examined the cognitive and noncognitive construct saturation of curriculum-sampling tests in a sample of psychology students. The curriculum-sampling tests showed high predictive validity for first year and third year academic achievement, mostly comparable to the predictive validity of high school GPA. In addition, curriculum-sampling test scores showed incremental validity over high school GPA. Applicants who scored low on the curriculum-sampling tests decided not to enroll in the program more often, indicating that curriculum-sampling admission tests may also promote self-selection. Contrary to expectations, the curriculum-sampling tests scores did not show any relationships with cognitive ability, but there were some indications for noncognitive saturation, mostly for perceived test competence. So, curriculum-sampling tests can serve as efficient admission tests that yield high predictive validity. Furthermore, when self-selection or student-program fit are major objectives of admission procedures, curriculum-sampling test may be preferred over or may be used in addition to high school GPA.</p></div

    Worked-out examples of the adequacy of Bayesian optional stopping

    Get PDF
    The practice of sequentially testing a null hypothesis as data are collected until the null hypothesis is rejected is known as optional stopping. It is well known that optional stopping is problematic in the context of p value-based null hypothesis significance testing: The false-positive rates quickly overcome the single test's significance level. However, the state of affairs under null hypothesis Bayesian testing, where p values are replaced by Bayes factors, has perhaps surprisingly been much less consensual. Rouder (2014) used simulations to defend the use of optional stopping under null hypothesis Bayesian testing. The idea behind these simulations is closely related to the idea of sampling from prior predictive distributions. Deng et al. (2016) and Hendriksen et al. (2020) have provided mathematical evidence to the effect that optional stopping under null hypothesis Bayesian testing does hold under some conditions. These papers are, however, exceedingly technical for most researchers in the applied social sciences. In this paper, we provide some mathematical derivations concerning Rouder's approximate simulation results for the two Bayesian hypothesis tests that he considered. The key idea is to consider the probability distribution of the Bayes factor, which is regarded as being a random variable across repeated sampling. This paper therefore offers an intuitive perspective to the literature and we believe it is a valid contribution towards understanding the practice of optional stopping in the context of Bayesian hypothesis testing

    Simplicity transformations for three-way arrays with symmetric slices, and applications to Tucker-3 models with sparse core arrays

    Get PDF
    AbstractTucker three-way PCA and Candecomp/Parafac are two well-known methods of generalizing principal component analysis to three way data. Candecomp/Parafac yields component matrices A (e.g., for subjects or objects), B (e.g., for variables) and C (e.g., for occasions) that are typically unique up to jointly permuting and rescaling columns. Tucker-3 analysis, on the other hand, has full transformational freedom. That is, the fit does not change when A,B, and C are postmultiplied by nonsingular transformation matrices, provided that the inverse transformations are applied to the so-called core array G̲. This freedom of transformation can be used to create a simple structure in A,B,C, and/or in G̲. This paper deals with the latter possibility exclusively. It revolves around the question of how a core array, or, in fact, any three-way array can be transformed to have a maximum number of zero elements. Direct applications are in Tucker-3 analysis, where simplicity of the core may facilitate the interpretation of a Tucker-3 solution, and in constrained Tucker-3 analysis, where hypotheses involving sparse cores are taken into account. In the latter cases, it is important to know what degree of sparseness can be attained as a tautology, by using the transformational freedom. In addition, simplicity transformations have proven useful as a mathematical tool to examine rank and generic or typical rank of three-way arrays. So far, a number of simplicity results have been attained, pertaining to arrays sampled randomly from continuous distributions. These results do not apply to three-way arrays with symmetric slices in one direction. The present paper offers a number of simplicity results for arrays with symmetric slices of order 2×2,3×3 and 4×4. Some generalizations to higher orders are also discussed. As a mathematical application, the problem of determining the typical rank of 4×3×3 and 5×3×3 arrays with symmetric slices will be revisited, using a sparse form with only 8 out of 36 elements nonzero for the former case and 10 out of 45 elements nonzero for the latter one, that can be attained almost surely for such arrays. The issue of maximal simplicity of the targets to be presented will be addressed, either by formal proofs or by relying on simulation results

    Using structural equation modeling to study traits and states in intensive longitudinal data

    Get PDF
    Traditionally, researchers have used time series and multilevel models to analyze intensive longitudinal data. However, these models do not directly address traits and states which conceptualize the stability and variability implicit in longitudinal research, and they do not explicitly take into account measurement error. An alternative to overcome these drawbacks is to consider structural equation models (state-trait SEMs) for longitudinal data that represent traits and states as latent variables. Most of these models are encompassed in the latent state-trait (LST) theory. These state-trait SEMs can be problematic when the number of measurement occasions increases. As they require the data to be in wide format, these models quickly become overparameterized and lead to nonconvergence issues. For these reasons, multilevel versions of state-trait SEMs have been proposed, which require the data in long format. To study how suitable state-trait SEMs are for intensive longitudinal data, we carried out a simulation study. We compared the traditional single level to the multilevel version of three state-trait SEMs. The selected models were the multistate-singletrait (MSST) model, the common and unique trait-state (CUTS) model, and the trait-state-occasion (TSO) model. Furthermore, we also included an empirical application. Our results indicated that the TSO model performed best in both the simulated and the empirical data. To conclude, we highlight the usefulness of state-trait SEMs to study the psychometric properties of the questionnaires used in intensive longitudinal data. Yet, these models still have multiple limitations, some of which might be overcome by extending them to more general frameworks. (PsycInfo Database Record (c) 2021 APA, all rights reserved)

    First and second-order derivatives for CP and INDSCAL

    Get PDF
    In this paper we provide the means to analyse the second-order differential structure of optimization functions concerning CANDECOMP/PARAFAC and INDSCAL. Closed-form formulas are given under two types of constraint: unit-length columns or orthonormality of two of the three component matrices. Some numerical problems that might occur during the computation of the Jacobian and Hessian matrices are addressed. The use of these matrices is illustrated in three applications. (C) 2010 Elsevier B.V. All rights reserved
    corecore